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Finite field arithmetic logic is central in the implementation of Reed- Solomon coders 
and in some cryptographic algorithms. There is a need for good multiplication and 
inversion algorithms that can be easily realized on VLSI chips. Massey and Omura 
recently developed a new multiplication algorithm for Galois fields based on a normal 
basis representation In this paper ; a pipeline structure is developed to realize the 
Massey-Omura multiplier in the finite field GF(2 m ). With the simple squaring property of 
the normal-basis representation used together with this multiplier, a pipeline architecture 
is also developed for computing inverse elements in GF(2 m ). The designs developed for 
the Massey-Omura multiplier and the computation of inverse elements are regular, simple, 
expandable and, therefore, naturally suitable for VLSI implementation 


I. Introduction 

Recently, Massey and Omura (Ref. 1) invented a multiplier 
which obtains the product of two elements in the finite field 
GF\ 2 m ). In their invention, they utilize a normal basis of form 
{a, a 2 , a 4 , ■ ■ * , a 2m ~*} to represent elements of the field 
where a is the root of an irreducible polynomial of degree m 
over GFX 2). In this basis each element in the field GF\ 2 m ) can 
be represented by m binary digits. 

In the normal*ba:JS representation the squaring of an ele- 
ment in GF\2 m ) is readily shown to be ? simple cyclic shift of 
its binary digits. Multiplication in the normal basis representa- 


tions requires for any one product digit the same logic cir- 
cuitry as it does for any other product digit. Adjacent 
product-digit circuits differ only in their inputs which are 
cyclically shifted versions of one another. In this paper, a 
pipeline architecture suitable for VLSI design is developed for 
a Massey-Omura multiplier on GF{ 2 m ). 

The conventional method for finding an inverse element in 
a finite field uses either table look-up or Euclid’s algorithms. 
These methods are not easily realized in a VLSI circuit. How- 
ever, using a Massey-Omura multiplier, a recursive, pipeline, 
inversion circuit is developed. This structure consists of four 
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sets of shift registers, one parallel-type Massey-Omm a multi- 
plier and two control signals. Such a design is regular, simple 
and expandable and, hence, naturallv suit, ble for VLSI imple- 
mentation. 

II. Squaring and Multiplying in a Normal 
Basis Representation 

In this section, the work originally described by Massey and 
Omura (Ref. 1 ) is reviewed. It is well known that there always 
exists a normal basis in the finite field GF[2 m ) (Ref. 2) for all 
positive integers, m. That is, one can find a field element a 
such that N= {a, a 2 , a 4 , • • • , a 2 *" 1 " 1 *} is a basis set of 
GF\2 m ). Thus every field element 0 e GF\2 m ) can be uniquely 
expressed as 

0 = b Q a + b { OL + b 2 ct + * * ■ + b m _ l a 2 ( 1 ) 

where b 0 , b l , b 2 , * * • , b m _ l are binary digits and addition is 
mod -2 addition. 


Thus, if 0 is represented as a vector of components of the 
normal basis elements of GF(2 m ) in the form j3 = \b 0 , b x , 
b 2 . ■ , b m _,]. then 0 2 = b 0 , • ■ & m . 2 l >n ,he 

normal basis representation 0 2 is a cyclic shift ot 0. Hence 
squaring in GF( 2 m ) can be realized physically by logic cir- 
cuitry which accomplishes cyclic shifts in a binary register. 
Such squaring circuitry is illustrated in block form in Fig. 1 . 

By ( 2) and ( 3 ) it is readily seen that 1 = a + a 2 + a 4 + - * - + 
a 2 < m-1 ) for any element a in GF(2 m ). This implies that the 
normal basis representation of 1 is ( 1 , 1 , 1 . ■ - * , 1 >. 

Let 0 = | 6 0 . b,, • • . b m _, 1 and 7 = |c 0 , c,. - . c m _,| 

be two elements of GF{2 m ) in a normal basis representation. 

Then the last term d , of the product, 
m- 1 

6=0-7= [d # . (5) 

is some binary function of the components of j3 and 7 , i.e.. 


Three useful properties of a finite field GF{ 2 m ) are stated 
here without proof (for proofs see, for example. Ref. 2). These 
properties are: 

( 1 ) Squaring in GF( 2 m ) is a linear operation. That is, given 
any two elements a and 0 in GF( 2 m ), 

(a + j3 ) 2 = a 2 + p 2 (2) 

(2) For any element a of GF( 2" ). 

a 2 = a (3) 

( 3 ) If a is a root of any irreducible polynomial />(*) of 
degree v c ' : - GF( 2 ), the powers, a, a 2 , a 4 , ■ ■ , 

"re m GF' 2 * *) and constitute a comple' set 
of roc H:) 

With regard to prcpe.’-.y L-) * vce/son and Weidon (Ref. 3) list 
a set of irreduciH 1 r/yvjsis of degree *n < 34 over GF('l) 
for which the roots L a, a*, a 4 , * are linearly 

independent. These iinear independent roots clearly form a 
normal basis of GF(2 m ). 

Suppose that {a. a 2 , a 4 ■ • ■ . a 2(m_l) } is a normal basis of 
GF( 2^ ). By (2) and ( 3 ) the square of ( 1 ) is 

is 2 = 6 0 a 2 +6 | a 4 + 6 2 a 8 + -+6 m _ 2 a: 2(m 0 + b m ^ l a 2 ’ n 

= b . a + 6 .a 2 + 6 ,a 4 + -6 (4) 

m — l u I m-l ' 


d m-x - AW" •*«-»; C 0 ’ C l' ’ C m-^ 


(6) 


Since squaring means a cyclic shift of an element in a normal 
basis representation, one has 


6 2 = 0 2 • 7 2 


(7) 


= [A J 

1 m — 10 1 m- 

= [d „d n ,d,,- ■ ,d A 

1 m-l 0 1’ m-2 J 

Hence the last component d m _ 2 of 6 2 is obtained by the same 
function / in ( 6 ) operation on the components of 0 2 and 7 2 . 
That is, d m _ 2 = f(b m _ v b 0 , ft,, --. b m _ 2 \ r m _ p c 0 , 
Cj, * * • , By squaring 6 repeatedly, it is evident that 

d ~ = f(b b nf b. , * * ■ , b 

m-2 J y m- 10' 1 ’ m-2 

C m-V C 0’ C V * * * ’ C m-2^ ^ ( 8 ) 


*0 ' J 2' * ’ ^m-r 

C V C 2' ’ C m-V C 0^ 
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The equations in (8) define the Massey-Omura multiplier, 
in the normal basis representation this multiplier has the 
pioperty that the same logic function / which is used to find 
.the last component of d m _ x of the product 6 can be used to 
find sequentially the remaining components d m „ y 
d m _y * * * * d Q of the product. This feature of the product 
operation requires only one logic function /of the 2m compo- 
nents of p and y to sequentially compute the m components of 
the product. 

Figure 2 illustrates the logic diagram of the above-desciibed 
sequential- type Massey-Omura multiplier on GF(2 m ). Alter- 
nately, for parallel operation this feature permits the use of m 
identical logic functions, /, for calculating simultaneously all 
components of the product. In the latter case, the inputs to 
the m logic functions / are connected directly to the compo- 
nents of P and 7. The only difference in the eonnectio. s to the 
components of P or 7 to a function /is that they are cyclically 
shifted versions of one another. Figure 3 shows the structure 
of the parallel-type Massey-Omura multiplier for the simple 
case of m = 4. The extension of this type of structure to a 
general case of GF(2 m ) is straightforward. 


III. A Pipeline Structure for Implementing 
Massey-Omura Multiplier 

A detaJed design of a Massey-Omura multiplier is now 
developed for the finite field GF( 2 4 ). As illustrated in Figs. 2 
and 3, the design of either the sequential-type or parallel-type 
Massey-Omura multiplier must focus on the product func- 
tion / 

The design of /begins with the selection of an irreducible 
polynomial Ffir) = x 4 + x 3 + 1 of degree m = 4 over GF(2) 
This particular polynomial function has linearly independent 
roots, namely, a. a 2 , a 4 and a 8 . Hence, the set of roots {<*, 
a 2 , a 4 , a 8 } constitutes a normal basis of GF( 2 4 ). Any two 
elements p and 7 in GF( 2 4 ) can be expressed as 

P = b Q a + b { a 2 + b 2 a 4 + b 3 a 8 

7 = c Q a + Cj a 2 + c 2 a 4 + c 3 a 8 

By (9) the product of p and 7 is 

6 = p * 7 = (6 Q a + b 1 a 2 + b 2 a 4 + b 3 a 
• (c' 0 a + c x a 2 + c 2 a 4 + c 3 a 8 ) 

= d Q a + d } a 2 + d 2 a 4 + d 3 a 8 



By ( 10) and the fact that a 4 = a 3 M, one obtains 


d 3 =b 2 C 2 + b 3 C 2 +b 2 C 3 + b 3 C l +b l C 3 


+ Vo + V 3 + Vo + Vl 


d 2 "Vl + Vl +fc lWo + Vj 


+ b 2 C 3 + b 3 C 2 +b 0 C 3 + b 3 C 0 


d l = Vo + Vo + Vl + V 3 + Vl 


+ b l C 2 + b 2 C l + b 3 C 2 + b 2 C 3 


) 


Ul) 


d O = b 2 C 3 + V*3 + Vo + b 0 C 2 * b 2 C 0 
+ V. + Vo + Vi + V: 

Comparing (1 1) with (8), the function /is given by 
b 3' C 0’ C 1*<V C 3> 


= b 2 C 2 +b 3 C 2 +b 2 C 3 + b 3 C l +b l C 3 


+ Vo + V 3 + Vo + Vl 


/ 


( 12 ) 


Since the mod-2 sum in (12) can be implemented by the 
"exclusive or” operation (XOR), the structure of the product 
function / can be represented by the logic circuit in Fig. 4. 
This circuit consists of two portions; the left half is an AND 
plane which computes each term of ( 12), while the right half is 
XOR plane which computes the mod-2 sum. The inputs to the 
AND plane are the complements of the components of P and 
7. This is due to the fact that the AND operation in the AND 
plane is obtained by the NOR operation on the complements 
of the two digits being ANDed, i.e., xy - (3c + y) where x is the 
complement of x. 

A pipeline structure of a Massey-Omura multiplier for 
GF(2 4 ) is shown in Fig. 5. This structure has a sequential type 
of operation. For each of the two inputs, corresponding to p 
and 7, to the /function, an inverter, two sets of shift registers, 
B and R y and 11 gate transistors are utilized. Note that regis- 
ters B and R have an identical circuit structure. 

In Fig. 5 during the first three dock cycles, when signal 
LD = 0, the complements of b t by b } and c y c y c { are fed 
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sequentiilly into three buffer flip-flops B k for (k - 1,2, 3). At 
the fourt i clock cycle, when Ld- 1 , the \alues of Z> 3 , 1> V b { 
and c y Cy c,, previously stored in buffer registers B k aiid^ Q 
and c Q are shifted into the second set of registers R k for 
(k = 1,2, 3,4). Then the /^-registers are cyclically shifted 
Such a cyclic-shift operation is needed to sequentially yield 
the product components d 2 , d { and d Q of 6 . While the 
/^-registers are cyclically shifting the components of 0 (or 7 ). 
the components of another element in GF( 2 4 ) following 0 
(or 7 ) can be fed into the buffer ^-registers. Therefore, the 
structure in Fig. 5 provides a pipeline operation in which no 
time is lost except for an initial fixed time delay. The VLSI 
layout of a Massey-Omura multiplier forCF/Z 4 ) is shown in 
Fig. 6 . 

Figure 7 illustrates a system structure of a pipelined 
Massey-Omura multiplier for GF{ 2 m ). For this general case 
over GF{ 2 m ), the buffer and the cyclic shift mechanism in 
Fig. 7 have m- 1 and m stages, respectively. Each stage con- 
sists of a shift register and a gate transistor. The product 
function / is a mod-2 sum of AND products of the compo- 
nents of the two inputs being multiplied. Such a circuit for 
function / consists of an AND programmed logic array (PLA) 
(Ref. 4) followed by an XOR sequential-PLA. In the XOR 
sequential-PLA there are several levels of XORs. At each level, 
the inputs, pair-by-pair, are fed sequentially one-by-one into 
an XOR as shown in Fig. 4. 

Let n{j) be the number of XOR circuits at the /-th level of 
the XOR sequential-PLA. Then n(j + 1 ) = [«(/)/ 2] where 
M is the smallest integer greater than x and where initially, 
n(0) = total number of terms to be XORed in product func- 
tion /. At the last level, there is only one XOR circuit and the 
output is the value of /. In general, if k denotes the number 
of levels required in the XOR sequential-PLA, k = [log 2 «(0)j . 

It should be noted that as m gets large, the number of 
mod -2 sums in the function /becomes large. In this case, more 
XORs and as a consequence more levels m the XOR sequen- 
tial-PLA are required. To maximize the pipeline operation 
speed, shift registers are required between the XOR levels in 
order to store the XOR outputs of the intermediate levels. 

Another approach to the realization of product function / 
is to use a standard AND-GR PLA (Ref. 4). This is possible 
since jc * v = xy v xy where v denotes inclusive OR. In general, 
although the design of /by the use of such a PLA is tedious, 
the product function / can be accomplished in less than one 
deck cycle. One trade-off for such a design is the large chip 
area required. The required area for such a PLA increases 
dramatically with m. Hence, a design utilizing a standard 
AND-OR PLA to realize /is practical only for small m. 


IV. A Pipeline Structure for Computing an 
Inverse Element in the Finite Field 
(GF(2»»)) 

For any a in the finite field GF( 2 m ), a 2 ™ = a. Hence the 
inverse of a is a -1 = a 2m “ 2 . Let 2 m - 2 be decomposed as 
2 + 2 2 + 2 3 + • • * + 2 m ~ 1 , then a -1 can be expressed as 

of 1 = (a 2 ) • (a 2 ) * (a 2 ) (a 2 ™ 1 ) (13) 

As discussed in Section II, if a is represented in a normal basis, 
squaring can be realized by a cyclic shift cpeiation. a 2/ is the 
/-th cyclical shift (CS) of a. Thus, the inverse element a ' 1 can 
be obtained by using successive cyclic-shift operations and a 
Massey-Omura multiplier. The algorithm for of 1 is the 
following: 

( 1 ) Obtain the cyclic shift of a, i.e., a' = CS(a) where CS 
denotes the cylic shift function. Let B = CS (a ) and 
C= 1. Let k = 0. 

(2) Multiply B and C to obtain the product, D = B * C. Set 
* = k+ 1. 

(3) If k = m - 1 , a - 1 - D. Stop. If k < m - L let B-CS(B) 
and C - D. 

(4) Go back to (2). 

Figure 8 shows a flow chart diagram of this procedure. 

This recursive algorithm for computing an inverse element 
in GF( 2 4 ) can be realized using the circuit shown in Fig. 9. In 
this circuit the parallel-type Massey-Omura multiplier shown in 
Fig. 3 with the circuit for the product function / shown in 
Fig. 4 is utilized. 

To illustrate, let Ld x and Ld 2 be two control signals with 
period of four clock signals as shown in Fig. 9. Also let the 
normal basis representation of a be {a Q , a { , a r a 3 ). At the end 
of the third clock pulse, the values a , a , are stored in the 
input buffer flip-flops B v B y B 3 , respectively. During the 
four clock cycle, a y a 0 , a ’ and a 2 are simultaneously shifted 
to R lt R r /? 3 and /? 4 . respectively. With the appropriate 
connections among the input buffer flip-flops B k and flip-flops 
R k , the cyclic shift of a = (a Q , a { , d 2 , a 3 ), i.e.. d 2 = (a r a Q% 
a x , a 2 ) is obtained in R . At the fourth clock pulse R 5 , R b , R Jt 
R are also fed the value “0’\ These four complementary 
values of “ 1 ” introduce the element 1 inG/ 7 ( 2 4 ). 

As it was discussed in Section II, a parallel-type GF{ 2 4 ) 
Massey-Omura multiplier simultaneously yieMs four product 
components d Q , d y d r d y Therefore, during the next three 
clocks three successive multiplications, i.e., 0 =1 • a 2 , /L, = 

• a 4 and 0 3 = P 2 • a 8 are performed for the m version. 
When the third multiplication is completed, Ld 2 = 1 . Thus 
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the output product digits, which together represent the 
inverse element a" 1 , are fed into the output buffer Hip-Hops 
B k - Finally these are sequentially shifted from the inversion 
circuit. 

The above technique for computing the inverse of an ele- 
ment in GF(2*) takes four clock cycles. During these four 


dock cycles, the circuit in Fig. 9 allows the bits of the next 
elei lent (following a) to be fed into it and the bits of the 
previous element to be shifted out of it, simultaneously. This 
type of circuit provides a full pipeline capability. A VLSI 
layout of the pipeline inversion circuitry for GF( 2 4 ) is pre- 
sented in Fig. 10. Figure 11 shows the system structure of an 
inversion circuit for the general finite field i r G'(2 fn ). 


References 


1. Massey, J. L., and Omura, J. K., Patent Application of Computational Method and 
Apparatus for Finite Field A rithmetic, submitted in 1 98 1 . 

2. MacWilliams, F. J., and Sloane, N. J. A., The Theory of Error-Con ecting Codes , 
Nortn-Holland Publishing, New York, 1977. 

3. Peterson, W. W., and Weldon, E. J., Jr., Error-Correcting Codes, MIT Press, Cambridge, 
1972. 

4. Mead, C., and Conway, L., Introduction to VLSI Systems, Addison-Wesley, Reading, 
1980. 


56 



ORIGINAL PAGE is 
OF POOR QUALITY 



Fig. 1. The squaring operation for a normal-basis representation over GF(2 m ) 



Fig. 2. System-logic diagram of a sequential-type Massey-Omura multiplier over Gf{2 m ) 



Fig. 3. Architecture of parallel-type Massey-Omura multiplier over GF(2 4 ) 
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Ffg. 11. System structure of a pipeline inversion circuitry ter Gf(2 m ) 
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